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Abstract — This paper investigates the construction of deter- 
ministic matrices preserving the entropy of random vectors with 
a given probability distribution. In particular, it is shown that for 
random vectors having i.i.d. discrete components, this is achieved 
by selecting a subset of rows of a Hadamard matrix such that (i) 
the selection is deterministic (ii) the fraction of selected rows is 
vanishing. In contrast, it is shown that for random vectors with 
i.i.d. continuous components, no partial Hadamard matrix of 
reduced dimension allows to preserve the entropy. These results 
are in agreement with the results of Wu-Verdu on almost lossless 
analog compression. This paper is however motivated by the 
complexity attribute of Hadamard matrices, which allows the 
use of efficient and stable reconstruction algorithms. The proof 
technique is based on a polar code martingale argument and 
on a new entropy power inequality for integer-valued random 
variables. 

Index Terms — Entropy-preserving matrices, Analog compres- 
sion, Compressed sensing, Entropy power inequality. 

I. Introduction 

Information theory has extensively studied the lossless 
and lossy compression of discrete time signals into digi- 
tal sequences. These problems are motivated by the model 
of Shannon, where an analog signal is first acquired, by 
sampling it at a high enough rate to preserve all of its 
information (Nyquist-Shannon sampling theorem), and then 
compressed. More recently, it was realized that proceeding 
to "joint sensing-compression" schemes can be beneficial. In 
particular, compressed sensing introduces the perspective that 
sparse signals can be compressively sensed to decrease mea- 
surement rate. As for joint source-channel coding schemes, 
one may wonder why this would be useful? Eventually, the 
signal is represented with the same amount of bits, so why 
would it be preferable to proceed jointly or separately? In a 
nutshell, if measurements are expensive (such as for example 
in certain bio-medial applications), then compressed sensing 
is beneficial. 

From an information-theoretic perspective, compressed 
sensing can be viewed as a form of analog to analog compres- 
sion. Namely, transforming a discrete time signal into a lower- 
dimensional discrete time signal over the reals, without "losing 
information". The key point being that, since measurements 
are analog, one may as well pack as much information in each 
measurement (whereas in the compression of discrete signals, 
a measurement on a larger alphabet is more expensive than a 
measurement in bits). However, compressing a vector in W 1 
into a vector in W n , m < n, without regularity constraints 



is not an interesting problem, since R n and W l have the 
same cardinality Hence, analog compression without any 
regulating conditions is trivial (as opposed to compression over 
finite fields). 

Recently, [1] introduced a more reasonable framework to 
study analog compression from an information-theoretic per- 
spective. By requiring the encoder to be linear and the decoder 
to be Lipschitz continuous, the fundamental compression limit 
is shown to be the Renyi information dimension. The setting 
of |l) also raises a new interesting problem: how to reach 
this limit with low-complexity schemes? In the same way 
that coding theory aims at approaching the Shannon limit 
with low-complexity schemes, it is a challenging problem to 
devise efficient schemes to reach the Renyi dimension. Indeed, 
in this analog framework, realizing measurements in a low 
complexity manner is at the heart of the problem: it is rather 
natural that the Renyi dimension is the fundamental limit 
irrespective of complexity considerations, but without a low- 
complexity scheme, one may not have any gain in proceeding 
with a joint compression-sensing approach. For example in 
the compressed sensing, with 0(k\og(n/k)) instead of O(k) 
measurements, fc-sparse signals can be reconstructed using l\ 
minimization, which is a convex optimization problem, rather 
than Zo minimization, which is intractable |5J, (6). Hence, in 
general, complexity requirements may raise the measurement 
rate. 

The scope of this paper is precisely to investigate what 
measurement rates can be achieved by taking into account the 
complexity of the sensing matrix, which in turn, influences 
the complexity of the reconstruction algorithm. Our goal is 
to consider signals that are memoryless and drawn from a 
distribution on M, which may be purely atomic, purely contin- 
uous or mixed. This paper focuses on the purely atomic and 
purely continuous cases. It is legitimate to attempt reaching 
this goal by borrowing tools from coding theory, in particular 
from codes achieving least compression rates in the discrete 
setting. Our approach is based on using Hadamard matrices 
for the encoding and developing a counter-part of the polar 
technique (2), |5) with arithmetic over K (or Z for atomic 
distributions) rather than F2 or ¥ q . The proof technique uses 
the martingale approach of polar codes and a new form of 
entropy power inequality for discrete distributions. Rigorous 

'Of course, such an approach is not practical in many regards: the approach 
would dramatically fail in the presence of noise, it would be problematic to 
implement nonlinear measurement, and it would be highly complex. 



results are obtained and sensing matrix construction is deter- 
ministic. A nested property is also investigated which allows 
one to adapt the measurement rate to the sparsity level of the 
signal. 

Recently, spatially-coupled LDPC codes have recently al- 
lowed to achieve rigorous results in coding theory. This 
approach has been exploited by j4j, which proposes the use of 
spatially coupled matrices for sensing. In |4|, the mixture case 
is covered and further analysis on the reconstruction algorithm 
is provided. However, the sensing matrix is still random. It 
is known that Hadamard matrices truncated randomly afford 
desirable properties for compressed sensing. In this paper, 
we show that by knowing the signal distribution, Hadamard 
matrices can be truncated deterministically and yet reach lower 
measurement rates. 

II. Related Work 

Let Xi, X 2 , ■ ■ ■ , Xpj, be i.i.d discrete random variables 
taking values in X = {0, 1, . . . , q — 1} with probability 
distribution px, where q G Z + and N = 2™ for some 
positive integer n. We use the notation a\ for the column 



vector {cn,a i+1 , 



and set aj to null if j < i. We also 



define [r] = {i G Z : 1 < i < r}. 

In Arikan's source coding p), q is a prime number and 

(\ A®" 

the arithmetic is over F q . Defining G = I „ ^ I , where 

(g) denotes the Kronecker product, Y± = GX^ over F q , and 
Hi = H(Y i \Yl~ 1 ), i e [N], as the conditional entropy of Y t 
given Y l ~ 1 , one obtains 

N 

Hi = H(Y*) = H{X?) = NH(X). 

i=i 

The polarization phenomenon states that for any 8 > and as 
n goes to infinity 

#{z G [N] : Hi G (5, 1 - 5)} 



N 



0, 



where H(X) denotes the entropy of X in base q. This implies 
that for large n, the values Hi, i G [N], have polarized to 
or 1. This provides a compression scheme achieving the least 
compression rate, since for every S G (0, 1) 



#{zG [N] 

N 



H(X). 



(1) 



From another point of view, every Yi is associated with the 
i-th row of the matrix G and ([TJ indicates that the "measure- 
ment" rate required to extract the informative components is 
close to the entropy of the source H{X) for large N . This 
gives a good "measurement matrix" for a given distribution 
over F q . 

In signal acquisition, measurements are analog. Hence, one 
can consider = GXf with arithmetic over the real field 
and investigate if any "polarization phenomenon" occurs. The 
difference is that, in this case, the measurement alphabet is 
unbounded. In particular, the Hi values are not bounded above. 
We will show in Theorem [T] that instead of a polarization 



phenomenon, where two extremal states survive, an absorption 
phenomenon occurs where 

#{z G [N] :Hi>6} 



N 



0, 



as N becomes larger, i.e., the measurement rate tends to 0. 

III. Problem Statement 

Definition 1 (Restricted iso-entropy property). Let X^ be 
discrete i.i.d. random variables with a marginal distribution px 
supported on a finite set. The family {$at} of measurement 
matrices, where <5>at has dimension x N, is e-REP(px) 
with measurement rate p if 



N 



< e 



(2) 



and 



lim sup 

AT->oo 



mjv 
N 



In general, the labeling N can be any subsequence of 



We will consider N 
equivalent to 



2™, 7i G Z + . Also note that ^ is 



H(<f>X?) < H(X?) < H($Xi 



Ne, 



when Xi ~ px , 



which is similar in form to the RIP condition J5J, J6j, replacing 
energy (l 2 norm) with entropy, and sparsity with a probabilistic 
characterization (which may or may not relate to sparsity). 

Definition 2. Let X^ be continuous (or mixture) random 
variables with probability distribution px- The family of 
measurement matrices {$Ar} of dimension mjy x N is (e,7)- 
REP(px) with measurement rate p if 

1) there exists a single letter quantizer Q : R — s- Z such 
that M.M.S.E. of X given Q(X) is less that ■y, 

2) for any N, 



g(Q(Xf)|j> jV Xf) 
N 

where Q(X") = (Q(X 1 ),Q(X 2 ), . 



..,Q(x N )Y, 



3) 



lim sup 

Af->oo 



m N 
N 



We address the following questions in this paper: 

1) Given a probability distribution px over a finite set, and 
e > 0, is there a family of measurement matrices that 
is e-REP and has measurement rate pi What is the set 
of all possible (e, p) pairs? Is it possible to construct 
a near optimal family of truncated Hadamard matrices 
with a minimal measurement rate? How is the truncation 
adapted to the distribution px ? 

2) Is it possible to obtain an asymptotic measurement rate 
below 1 for continuous distributions? 

Remark 1. The RIP notion introduced in (5), (6) is useful in 
compressed sensing, since it guarantees a stable l 2 -recovery. 
We will consider truncated Hadamard matrices satisfying REP 
condition and since they have a Kronecker structure, we will 
obtain a low-complexity reconstruction algorithm. This part is 



however not emphasized in this paper, and we mainly focus on 
the construction of the truncated Hadamard matrices. Section 
VI provides numerical simulations of a divide and conquer 
ML decoding algorithm and illustrates the robustness of the 
recovery to noise. In a future work, we will investigate the use 
of a recovery algorithm a la j4). 

IV. Main Results 
The main results of this paper are summarized here. 



Definition 3. Let {Jx — 



1 1 

-1 1 



,N = 2 n ,n £ Z+} 



be the family of Hadamard matrices. Suppose Xf are i.i.d. 
random variables with distribution px over a finite subset of Z. 
Let Yf = J N X? and define H, = H{Y i \Yl~ 1 ) and m N = 
#{i £ [N] : Hi > e}. The (e,px )-truncated Hadamard family 
{Jn}, is the set of matrices of dimension mx x N obtained 
by selecting those rows of Jx with Hi > e. 

Theorem 1 (Absorption phenomenon). Let X be a discrete 
random variable with a probability distribution px supported 
on a finite subset of Z. For a fixed e > 0, the family of 
(e,px)-truncated Hadamard matrices {Jx, N = 2 n ,n £ Z + } 
(defined above) are e-REP(px) with measurement rate 0. In 
other words, 

v mN n 

limsup — — = 0. 

Remark 2. Although all of the measurement matrices Jx 
are constructed by truncating the matrices Jx, the order and 
number of the selected rows, mx, to construct Jx depends 
on the distribution px- 

The proof idea is to construct a conditional entropy martin- 
gale process similar to |3| which is bounded from below and 
hence converges almost surely. Then, the following "entropy 
power inequality" result which we prove in Subsection V-A is 
used to show the convergence to 0. 

Theorem 2 (An EPI over Z). For every probability distribu- 
tion p over Z, 



H(p*p)-H(p)>g(H(p)). 



(3) 



where g : R + — > K + is strictly increasing, lim^^oo g(x) = 
8l0 g( 2 ) an d g(x) — if and only if x = 0. 

Remark 3. This theorem complements the work in J7J to ob- 
tain an entropy power inequality for discrete random variables. 

For continuous distributions, and for any fixed distortion 7, 
the measurement rate approaches 1 as e tends to 0. This result 
has been shown in |T) in a more general context. We recover 
this result in our setting for the case of a uniform distribution 
over [—1,1] which is proved in the Appendix. 

Lemma 1. Let pu be the uniform distribution over [—1, 1] and 
let Q : [— 1, 1] — > {0, 1, . . . , q — 1} be a uniform quantizer for 
X with M.M.S.E. less than 7. Assume that {"J* at} is a family 
of full rank measurement matrices of dimension mx X N. If 
{&x} is (e,"/)-REP(pu), then the measurement rate, p, goes 
to 1 as e tends to 0. 



V. Proof Overview 
A. An EPI for discrete random variables 

In this section we prove the EPI result stated in Theo- 
rem [2| If p is a probability distribution over a finite set 
{0, 1, 2, . . . , q — 1}, then from the continuity and strict con- 
cavity of the entropy function, there is always a guaranteed 
gap between H(p-kp) and H(p) and the gap is if and only 
if H(p) — 0. Theorem [2] shows that this gap is uniformly 
bounded away from 0. 

If X and Y are two real valued, continuous and independent 
random variables, then 



2^h(X+Y) > 2^h(X) 2 2h ( Y ) 



(4) 



where h denotes the differential entropy. Equality holds if and 
only if X and Y are Gaussian random variables. If X and Y 
have the same density p, then |4]) becomes 

h(p*p) > hip) + i, 

which implies a guaranteed increase of the differential entropy 
for i.i.d. random variables. For this reason we call ([3| an EPI 
for discrete random variables. 

Lemma 2. Let c > and suppose p is a probability measure 
over Z such that H(p) = c. Then, for any i £ Z, 

H(p*p) - c > cpi - (1 + Pt)h 2 (Pi), 

where ti2(x) = —x log 2 (x) — (1 — x) log 2 (l — a;) is the binary 
entropy function and pi denotes the probability of i. 

Proof: For a finite positive measure v on Z, define 

H(v) = — 2~2i<=z v i log^i- Note that for 7 > 0, we have 

H{ 1 v) = L{ 1 )+ 1 H{v), 

where £(7) = —7 log 7. Let i £ Z, x = pi and let us write 

p = xSi + (1 — x)v, 

where v is a probability measure on Z \ {i}. Note that 

H(p)=h 2 {x) + {l-x)H{v) = c, 

where hi denotes the binary entropy function. We also have 

p-k p — x 2 S 2 i + 2x(l — x)i>i + (1 — x) 2 ^ * v 1 

where for k £ Z, z^(fc) = v[k — i). By concavity of the 
entropy, 

H{p *p)> 2x(l - x)H{v) + (1 - xfH (v * v) 
> {l-x 2 )H{v) 
= (l + x)(c-h 2 (x)). 

Hence, H(p *p) — c > cx — (1 + x)h 2 {x). ■ 

Lemma 3. Let c > 0, < a < ^ and n £ Z. Assume that p is 
a probability measure on Z such that a < p((— 00, n\) < I — a 
and H(p) = c, then 

\\p*Pl -p*p2||l > 2a, 



Where Pi = p((-L,n}) P\(-°o,n] and p 2 = p([w+ 1 1 , 00 ) ) y|[n+l,o ) 
are scaled restrictions of p to (—00, n] and [n + 1, 00) 
respectively. 

Proof: Let a\ — p((— 00,71]) and a 2 = p([n + 1, 00)) = 
1 — u\. Note that p = a\p\ + a 2 p 2 . We distinguish two cases 
ot\ < J and ai > |. If «i < I then we have 

||p*Pi -p*p 2 || 

= ||aipi *pi - (1 - a 1 )p 2 *P2 + (1 - 2ai)pi ★ j> 2 1 1 1 

> ||aipi *pi - (1 - «i)p 2 *P 2 ||i - (1 - 2ai)||pi *p 2 ||i 
= ai + (1 - a x ) - (1 - 2ai) = 2«! > 2a, 

whereas if ai > g we have 
||j?*Pi -p*P2|| 

= ||aipi *pi - (1 - ai)p 2 *P2 + (1 - 2ai)pi *p 2 ||i 

> ||aipi *pi - (1 - cti)p2*P2\\i ~ ( 2 "i ~ !)lbi *Pzh 
= ai + (1 - ai) - (2ai - 1) = 2(1 - ai) > 2a, 

where we used the triangle inequality, 1 — a>i > a and the 
fact that pi -kpi and p 2 *p 2 have non-overlapping supports, so 
the ^i-norm of the sum is equal to sum of the corresponding 
£i-norms. ■ 

Lemma 4. Assuming the hypotheses of Lemma [i] 



H(p*p) — c > 



21og(2) 



\p*Pi -p*p 2 \\i- 



Proof: Let oti and a 2 be the same as in Lemma [3] Let 
vi = Pi * P, v 2 = p 2 * p, and for x G [0, 1], define \i x = 
xvi + (1 — x)v 2 and f(x) = H(fi x ). One obtains 



f'( x ) = - 53 (^li - ^) log 2 (/i«), 

/"(*)- 1 v-K-^) 2 



E 



< 0. 



log(2) " p^ 

Hence /(x) is a concave function of x. Moreover, 

/'(0)= D^H^+if^!)-^), 
/'(I) = -D{y 2 \\vi) + H(v x ) - H(v 2 ), 

Since p\ and p 2 have separate supports, there is i,j such that 
vu — Q,v 2i > and v X j > 0, v 2 j = 0. Hence D(vi\\v 2 ) and 
-D(^ 2 ||^i) are both equal to infinity. In other words, 

/' (0) = +00, 
/'(I) = -00. 

Hence the unique maximum of the function must happen 
between and 1. Assume that for fixed vi and v 2 , x* is the 
maximizer. If < ai < x* then 

aif (at) = y^ai(^2i - ^h) log 2 (/i Qli ) > 0, 



which implies that 

= - E^ 2i + _ U 2i)) log 2 (/X Qli ) 

= H(u 2 )+D(v 2 \\li ai ) 



>H(p) 
= H(p) 



21og(2) 



1^2 - Mailll 



21og(2) 

where we used Pinsker's inequality 

D(r\\s) > -i-z-\\r - sll?. 

V 11 ; - 21og(2)" 111 

Similarly, we can show that if x* < at < 1 then 

/(a0>*(P) + *^lh-«*lfr 

As a < Qi < 1 - a and a < | it results that 

H(p*p) = H(aip*pi + (1 - «i)p*p 2 ) 
= /(«i) 



>-ff(p) 



21og(2) 



- C+ 2log(2) 111/1 -^li- " 
Lemma 5. Assuming the hypotheses of Lemma [5] 

2a 4 

Proof: Combine Lemma [3] and |4] ■ 

Proof of Theorem [2j Suppose that p is a distribution 
over Z with H(p) = c. Set y = ||p||oo- It is eas Y to see that 
y > 2~ c . Also there is an a > and an integer n such that 
O- < 00, n]) < 1 — a. Using Lemma |5] and Lemma [5] it 
results that H(p-kp) — c > t(c) where 

t(c) = ^ max(^ y , cy - (1 + y)h 2 (y)). 

For simplicity we consider 

(1 - y) 4 

g(c)= imp max - — -—, cy - (1 + y)h 2 (y)), 
ye [0,1] 81og(2) 

which is less than or equal to t(c). It is easy to check that g(c) 
is a continuous function of c. The monotonicity of g follows 
from the fact that cy — (1 + y)h 2 (y) is an increasing function 
of c for every y € [0, 1]. For strict positivity, note that (1 — y) A 
is strictly positive for y £ [0, 1) and it is when y — 1, but 
liniy-j,! — (1 + y)h 2 (y) = c. Hence for c > 0, y(c) > 0. If 
c = then 

m^C—^-,cy-(l + y)h 2 (y))= ( J 



81og(2) 



81og(2) 



and its minimum over [0, 1] is 0. 

For asymptotic behavior, note that at y — 0, cy — (1 + 
y)h 2 {y) = and tkz^L = ^^y. Hence, from continuity, it 
results that g(c) < g log ( 2 ) f° r anv c > 0. Also for any e > 
there exists a c such that for any e < y < 1, cy — (1 + 
y)h 2 (y) > 8iog(2) • Thus for any e > there is a Co such that 
for c > Co, the outer minimum over y in 17(c) is achieved on 
[0, e]. Hence, for any c > Co, 5(c) > gi og ( 2 ) ■ This implies that 
for every e > 0, 

> lim sup g(c) > liminf <?(c) > ^ 



81og(2) 
and lim^oo g(c) 



81og(2)' 



81og(2) 



Figure [TJ shows the EPI gap. As expected, for large values of 
H(p), the gap approaches the asymptotic value 8 i og ( 2 ) ■ This is 
very similar to the EPI bound obtained for continuous random 
variables an we believe that one can improve this asymptotic 



bound to achieve |. 



EPI gap lor discrete random variables 



Fig. 1: EPI gap for discrete random variables 



B. Conditional Entropy Martingale 

Assume that X^, N = 2", n e 
random variables with probability distribution px over a finite 



is a set of i.i.d. 



1 



subset of Z. Let Yf* = J^X^ , where J N = . ( ^ 

is the Hadamard matrix of dimension N and let 
H(Yi\ Y^ 1 ), i <G [N], be the conditional entropy values. 

Lemma 6. Let X^ be as in the previous part and let 
Z? = G N X?, where G { 1 ' ' 



*N 



1 



. Assume that 



Hi = H{Zi\Zi~ l ), then Hi = H h i e [N]. 

Remark 4. The only point of Lemma [6] is that in application, 
it is preferred to use J because the rows of J are orthogonal 
to one another. For simplicity of the proof, we use G matrices 
and relate to the polar code notations (2), |3j. 

Proof: We prove by induction over n and consider the 
fact that Jjv an d GW have similar recursive structure as a 
function of Jw and Gn. For simplicity, we prove the lemma 
in a more general case. Assume that Z^Y^, i s [N], are 
as introduced before. Suppose O is a random element and 
redefine Hi = HiY^Y^ 1 , 0) and Hi = H(Zi\Z{- 1 ,0). We 



prove that Hi = Hi. By putting O equal to null, we obtain 
the proof for the lemma. For n = 1 we have 

H 1 =H(Y 1 \0) 

= H{X 1 +X 2 \0) 
= H(Z 1 \0) 

We also have 

H(Y u Y 2 \0) = H(X U X 2 \<D) 
= H(Z 1 ,Z 2 \(D). 

Hence, from the chain rule for conditional entropy we obtain 
that H 2 = H 2 . Now assume that we have the result for all 
n < m and we prove it for n = m + 1. For simplicity, let us 
define the following notations 



v} m) = X? 

Am = J 2 m 



R — B m V 2 



(m) 



T = A m (vt l) -V} m) ) 



T/ (m) _ - r 2 m+1 
V 2 — A 2m+1 , 

B m = G 2 m , 
S = A m V2 , 



From the recursive structure of J and G matrices, the first 2 r 
components of Zf m+1 and Y 1 2 '" +1 are equal to A m (V} - 



V^" 1 ^) and B m (Vi " l> + V^" 11 ) respectively. The components 
of V} m) + V 2 (m) are i.i.d. random variables. Hence, using the 
induction hypothesis we obtain that the first 2 m components 
of Hi and Hi are equal. Now we prove that for i = 2 m + 
1, . . . , 2 m+1 , they are also equal. For 2 m + 1 < % < 2 m+1 , 
setting j = i — 2 m we have 

Hi = H{Ri\B m {V^ + vt l) ),R{-\0) 



= H(Rj\v} m) +V 2 v '"-',R{~\0), 

Hi = HftlAniV™ + vt ] ),T(-\0) 
= H(T j \vl m) + V^ m \T(-\0) 
= H{S 3 \V^ m) +V 2 (m) ,S{-\0), 

where we used the invertibility of A m and B m . Setting O' = 
{V} m) + V 2 im \0} and using the induction hypothesis, we 
obtain that for 2 m + 1 < i < 2 m+1 , H, t = H,. Hence the 
induction proof is complete. Now setting O equal to null, we 
obtain the proof for Lemma [6] ■ 

Notice that we can represent Gn in a recursive way. Let us 
define two binary operation © and Q as follows 

Q(a,b) =a + b 
@(a,b) = b, 

where + is the usual integer addition. It is easy to see that we 
can do the multiplication by Gjv in a recursive way. Figure 
[2] shows a simple case for G4. The — or + sign on an arrow 
shows that the result for that arrow is obtained by applying a 
Q or © operation to two input operands. 



(™) Rj'-l 



and 





+ 








-+ 






++ 



5-1 



Fig. 2: Recursive structure for multiplication by G4 



If we consider a special output Y m , there are a sequence 
of © and operations on the input random variables which 
result in Y m . An easy way to find this sequence of operations 
is to write the binary expansion of m — 1. Then each 
in this expansion corresponds to a operation and each 1 
corresponds to a operation. Using this binary labeling, we 
define a binary stochastic process. Assume that £1 = {0, 1} 00 , 
and T is the cr-algebra generated by the cylindrical sets 



S, 



(ii,i 2 ,...,i s ) 



= {ui E fl such that cj^ = 1, . 



= 1} 



for every integer s and i\, 12, ■ ■ ■ , i s . We also define F n as the 
cr-algebra generated by the first n coordinates of oj. In other 
words, T n is the cr-algebra generated by sets of the form 

{oj E £1 such that oj\ = 1, . . . , uj n = 1}. 

Let J"o = {0, ft} be the trivial cr-algebra. We also define the 
uniform probability measure fi over the cylindrical sets by 

which by uniformity assumption, is independent of the values 
taken by ii, . . . , i n . This measure can be uniquely extended 
to J- ' . Let [uj] n = UJ1UJ2 ■ ■ ■ uj u denote the first n coordinates of 
uj = 0J1LJ2 ■ ■ ■ and Yjcj]^ denote the random variable Yi, where 
the binary expansion of i — 1 is [uj] n , and let yMn denote 

= {Y[v]n ; V<0J }. 

We also define the random variable /„ by 

I n {u) = H{Y Mn \Y [u]n ). (5) 
As an example, if oj = 0.10 . . . then 

I 2 (u) = H{Y W \Y 01 ,Y 00 ) = H(F 3 |n,>2). 
It is also important to note that 

l n+1 {[uj] n ,0) =H(Y [u]n +Y [u]n \Y^,Y^) (6) 

where ~ denotes an independent copy of the corresponding 
random element. 

Theorem 3. (I n ,jF n ) is a martingale. 

Proof: I n is adapted to T n by definition. Hence it is 
sufficient to show that E{I n+ i\F rl } = /„. For simplicity, we 



prove the case n — 1. The general case is similar. Using Figure 
[2] we have 

T(wi) = £{/ 2 M|J 7 i} 

which is a function of oj\. 

T{0) = \{lm+hi) 

= l(H(Y 00 ) + H(Y 01 \Y 00 )) 

= l -H(Y m ,Y 01 ) = ^H(Y ,Y ) = H(Yq). 

We can also show that T(l) = H(Y X \Y ). Hence, T(cJi) = 
I\{oS) and E{I 2 \J-i} — I\. Similarly, we can show that 

-^{-^n+ll^n} = In- ■ 

C. Main Theorem 

In this section, we prove the main theorem of the paper. 

Proof of Theorem [7} Assume that Y ± N = G N X^, for 
N = 2 n ,n E Z + , and Hi = H(Y l \Y{ t - 1 ), i E [N]. Also fix 
e > 0. Let us define 

K n = {i:i€[N],H i >e}, 
Y[K n ] - {Yi : j e [K n ]}. 

Hence, by Definition [3] \K n \ = and Jjy is obtained from 
Jn by selecting the rows with index K n . We have 

H(X N \J N X?) = H{X?) - I(X?; J N X») 
= H(Y 1 N )-H(Y [Kn] ) 
= H(Y [Kg] \Y [Kn] ) 

< H{Yi\Yt l ) 

< \K$\e=(N-m N )e, 

which implies that 

H{X?\J N X?) < (N-m N )e 



N 



N 



< e. 



This shows that the family {Jn} is e-REP. Now it remains 
to show that the measurement rate of this family is 0. To 
prove this, we construct the martingale I n by |5]). /„ is a 
positive martingale and converges to a random variable 1^ 
almost surely. Our aim is to show that for any two positive 
numbers a and b where a < b, ^(Ioo G ( a i^)) = 0. which 
implies that //(loo € {0,oo}) = 1. Since /„ is a martingale, 
E{I n } = E{I } = H(X) < 00. Using Fatou's lemma we 
obtain 

E{Ioo} < liminf E{I n } = H(Xi) < 00, 

which implies that /i(Ioo = 00) = 0. Hence, /„ converges 
almost surely to and it also converges to in probability. In 



other words, given e > 0, 

limsup (i(I n > e) 



and the SNR at the output of the decoder as 



,. \Kn\ 

= hm sup 

^ on 

n— ^oo * 

= limsup = 0. 

jV->oo 



This implies that for a fixed e > the measurement rate p is 
0. Now it remains to prove that for any two positive numbers 
a and b, where a < b, /x(/oo €E (a, &)) = 0. Fix a <5 > then 
for every u> in the convergence set there is a no such that for 
n > n , \I n+1 (ui) — /„(w)| < 5. Using the martingale proprty 

In{u) = -(/„+i([w] n ,0)+/ n+ i([w]„,l)), 
we obtain that for n > uq, 



l n+l 



(w)-I n (u))\ = |7 n+1 (M„,Q) -I n ([u] n )\ < S. 



Using (|6]l and the entropy power inequality ([3]), it results that 
< I n (oj) < p(6) where p(5) can be obtained from g. This 
implies that /„ must converge to 0. ■ 

VI. Numerical Simulations 

For simulation, we use a binary random variable, where 
Px(0) =1— p for some < p < \. 



A. Absorption Phenomenon 

Figure [3] shows the absorption phenomenon for p 

and N = 64, 128,256,512. 



0.05 



B. Nested Property 

Absorption phenomenon is shown in Figure|4]for N = 512 
and different values of p. It is seen that the high entropy indices 
for smaller p are included in the high entropy indices of the 
larger one. We call this the "nested" property. The benefit of 
the nested property is that it allows one to take measurements 
adaptively if the sparsity level is unknown. In other words, one 
takes some measurements corresponding to the high entropy 
indices and if the recovery is not successful, refines them by 
adding extra measurements that correspond to the indices with 
lower entropy to improve the quality of recovery. 

C. Robustness to Measurement Noise 

Figure [5] shows the stability analysis of the reconstruction 
algorithm to Gaussian measurement noise. For simulation, we 
used N = 512, p = 0.05 and took all of the indices with 
entropy greater than 0.01. In other words, the measurement 
matrix was 0.01-REP for the binary distribution p. For recov- 
ery, we use ML decoder which exploits the recursive structure 
of the polar code. Let denote the input random variables by 
Xi and assume that we keep all of the rows of the matrix 
Jn with indices in the set K. We define the SNR( signal to 
noise ratio) at the input of the decoder as: 



SNR in = 



\K\a 2 



i N 

SNR 0Ut = -J^EdXt-Xtl 2 ), 



where a 2 is the noise variance and X{ is the output of the ML 
decoder. The result shows approximately 4 dB loss in SNR for 
high SNR regime. Notice that some part of this loss results 
from the finite distortion 0.01 that we tolerate by removing 
the measurements corresponding to low entropy indices. 

Stability analysis lor ML decoder 




Fig. 5: Stability analysis 



Appendix 

Proof of LemmaQ] 

Let Xi be a set of N i.i.d. random variables with a uniform 
distribution over [-1, 1]. Let A = Q{X. t ), i € [N], be the 
uniform quantizer output for X t . It is easy to see that we 
can write X t = 2D '~ ?+1 + d, i G [N], where d is the 
quantization noise which is uniformly distributed over [— A, |]. 
Moreover, Cj is independent of Di. As $at is full rank, the 
vector random variable $nX^ has a well-defined density over 
E m «. We have 

H{D?\* N X?) 
= H(D?)-I(D?;$> N X») 
= Nlog 2 (q) - h{$ N X?) + h($ N X»\D?) 
= Nlog 2 (q) - h{$ N X?) + h(3> N C?) 
= Nlog 2 (q) - h($ N X N ) + h(& N X?) - m N \og 2 {q) 
= (N -m N )\og 2 (q) < Ne, 

where I denotes the mutual information between two random 
variables and h is the differential entropy for continuous 
distributions. This implies that 

p = am sup — — > 1 — : f-^- , 

tv^oo N iog 2 (q) 

which gives the desired result as e goes to 0. In the proof 
we used the fact that &nX^ has the same distribution as 

q x $jvCf . 



Absorption Scheme lor N=64, p=0.05 



Absorplion Scheme lot N=128, p=0.05 
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(b) N = 128 



Absorption Scheme for N=256, p-0.05 
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Absorption Scheme for N=51 2, p=0.05 
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(d) N = 512 



Fig. 3: Absorption trace for p = 0.05 



Absorplion Scheme (or N=512 
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Absorplion Scheme lor N=51 2 
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Fig. 4: Nested property for TV = 512 and different p 
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